The first two chunks of this r markdown file after the r setup allow for plot zooming, but it also means that the html file must be opened in a browser to view the document properly. When it knits in RStudio the preview will appear empty but the html when opened in a browser will have all the info and you can click on each plot to Zoom in on it.
A few notes about this script.
If you are running this make sure you download the whole SRFN (GitHub repository)[https://github.com/marissadyck/SRFN_ACME_Camera_Project] from my GitHub. This will ensure you have all the files, data, and proper folder structure you will need to run this code and associated analyses.
Also make sure you open RStudio through the R project (ARFN_ACME_Camera_Project.Rproj) this will automatically set your working directory to the correct place (wherever you saved the repository and it’s files) and ensure you don’t have to change the file paths for some of the data.
Lastly, if you are looking to adapt this code for a future year of data, you will want to ensure you have run all the code through 2_ACME_SRFN_landscape_covariate_exploration_script.Rmd with your data as there is much data formatting, cleaning, and restructuring that has to be done before this code will work. Helpful note: The files are numbered in the order they are used to prep for this analysis.
If you have question please email the most recent author, currently
Marissa A. Dyck
Postdoctoral research fellow
University of Victoria
School of Environmental Studies
Email: marissadyck17@gmail.com
Before starting you should ensure you have the latest version of R and RStudio downloaded. This code was generated under R version 4.2.3 and with RStudio version 2024.04.2+764.
You can download R and RStudio HERE
This script is written in R markdown and thus uses a mix of coding markup languages and R. If you are planning to run this script with new data or make any modifications you will want to be familiar with some basics of R markdown.
Below is an R markdown cheatsheet to help you get started,
R
markdown cheatsheet
If you don’t already have the following packages installed, use the code below to install them. *NOTE this will not run automatically as eval=FALSE is included in the chunk setup (i.e. I don’t want it to run every time I run this code since I have the packages installed).
install.packages('tidyverse')
install.packages('ggpubr')
install.packages('corrplot')
install.packages('Hmisc')
install.packages('glmmTMB')
install.packages('MuMIn')
install.packages('TMB', type = 'source')
install.packages('rphylopic')
install.packages('broom')
install.packages('lme4')
install.packages('car')
install.packages('ggeffects')
install.packages('flextable', type = 'binary')
install.packages('gfonts') # needed for flextable
install.packages('officer') # needed for flextable
Then load the packages to your library so they are usable for this session.
library(tidyverse) # data tidying, visualization, and much more; this will load all tidyverse packages, can see complete list using tidyverse_packages()
library(ggpubr) # make modifications to plot for publication (arrange plots)
## Warning: package 'ggpubr' was built under R version 4.5.1
library(PerformanceAnalytics) # Used to generate a correlation plot
library(Hmisc) # used to generate histograms for all variables in data frame
library(glmmTMB) # Constructing GLMMs
library(MuMIn) # for model selection
library(rphylopic) # add animal silhouettes to graphs
library(broom) # extracting odds ratios in a tidy format
library(lme4) # constructing generalized linear mixed effects models
library(car) # used for calculating variance inflation factor (VIF) to assess model fit
library(ggeffects) # for extracting predicted probabilities from glms for plotting
I’ve added this short section, while not directly related to the analysis, will provide some summary statistics in one convenient location to report in results etc.
First let’s read in the cleaned raw data with all the timelapse images and drop any with no species id
timelapse <- read_csv('data/raw/srfn_timelapse_data.csv',
col_types = cols(species = col_factor())) %>%
drop_na(species)
## New names:
## • `` -> `...1`
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
The number of observations of this timelapse object gives us the total non-blank images from the study
levels(timelapse$species)
## [1] "White-tailed deer" "Other" "Moose"
## [4] "Black bear" "Unknown bear" "Unknown"
## [7] "Unknown ungulate" "Human" "Mule deer"
## [10] "Elk" "Owl" "Coyote"
## [13] "Red fox" "Grey wolf" "Lynx"
## [16] "ATVer" "Unknown deer" "Grizzly bear"
## [19] "Snowshoe hare" "Fisher" "Ruffed grouse"
## [22] "Red squirrel" "Raven" "Cougar"
## [25] "Spruce grouse" "Marten" "Domestic dog"
## [28] "Staff"
Then we can filter to how many of those were identified to species as mammals
mammals <- c('White-tailed deer',
'Moose',
'Black bear',
'Mule deer',
'Elk',
'Snowshoe hare',
'Grizzly bear',
'Caribou',
'Coyote',
'Fisher',
'Grey wolf',
'Lynx',
'Red fox',
'White-tailed deer',
'Red squirrel',
'Cougar')
timelapse_mammals <- timelapse %>%
filter(species %in% mammals) %>%
droplevels()
# check the levels
levels(timelapse_mammals$species)
## [1] "White-tailed deer" "Moose" "Black bear"
## [4] "Mule deer" "Elk" "Coyote"
## [7] "Red fox" "Grey wolf" "Lynx"
## [10] "Grizzly bear" "Snowshoe hare" "Fisher"
## [13] "Red squirrel" "Cougar"
# Now get summaries of each mammal
timelapse_mammals %>%
group_by(species) %>%
summarise(count = n()) %>%
arrange(desc(count))
## # A tibble: 14 × 2
## species count
## <fct> <int>
## 1 White-tailed deer 31478
## 2 Moose 6842
## 3 Elk 1875
## 4 Black bear 885
## 5 Coyote 709
## 6 Mule deer 666
## 7 Snowshoe hare 268
## 8 Grey wolf 114
## 9 Grizzly bear 102
## 10 Lynx 63
## 11 Red fox 60
## 12 Red squirrel 27
## 13 Cougar 24
## 14 Fisher 16
Now we can start the analysis prep.
First we need to read in the proportional detection (response metrics) and covariate (explanatory metrics) data files for all 6 LUs (fiscal years 2021-2022 and 2022-2023)
We have multiple response metric files that were generated in script #1, since there were a few very rare species we want to look at, let’s load in both files for now and make sure they are formatted properly for the analysis
I’m going to load them all at once using purrr and we can separate them later depending on what we want to use them for
response_metrics <- file.path('data/processed',
# provide file names
c('srfn_proportional_detections.csv',
'srfn_total_detections.csv',
'srfn_presence_absence.csv')) %>%
# use purrr map to read in all files
map(~.x %>%
# use tidyverse read_csv
read_csv(.,
# specify how some columns are read in
col_types = cols(site = col_factor())) %>%
# set column names to lowercase for coding ease
set_names(
names(.) %>%
tolower())) %>%
# set names for list items
purrr::set_names('prop_detections',
'total_detections',
'presence_absence')
str(response_metrics)
## List of 3
## $ prop_detections : spc_tbl_ [63 × 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ site : Factor w/ 63 levels "LUAG_119","LUAG_124",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ black_bear : num [1:63] 4 1 3 1 0 0 6 3 0 4 ...
## ..$ coyote : num [1:63] 8 2 6 2 0 1 11 7 1 8 ...
## ..$ red_fox : num [1:63] 1 1 1 1 0 0 1 2 0 0 ...
## ..$ white-tailed_deer : num [1:63] 13 12 9 16 11 7 14 8 5 11 ...
## ..$ fisher : num [1:63] 0 1 0 0 0 0 0 1 0 0 ...
## ..$ grey_wolf : num [1:63] 0 1 0 1 0 0 2 3 0 3 ...
## ..$ grizzly_bear : num [1:63] 0 1 0 0 0 0 0 0 0 0 ...
## ..$ marten : num [1:63] 0 1 2 0 0 0 0 0 0 0 ...
## ..$ snowshoe_hare : num [1:63] 0 5 1 0 1 5 1 3 0 0 ...
## ..$ elk : num [1:63] 0 0 7 2 0 0 0 0 0 8 ...
## ..$ moose : num [1:63] 0 0 4 8 11 1 7 7 13 3 ...
## ..$ mule_deer : num [1:63] 0 0 1 3 0 1 0 0 0 0 ...
## ..$ lynx : num [1:63] 0 0 0 0 0 1 1 2 0 0 ...
## ..$ cougar : num [1:63] 0 0 0 0 0 0 1 0 0 0 ...
## ..$ absent_black_bear : num [1:63] 9 12 6 15 16 14 8 10 14 8 ...
## ..$ absent_coyote : num [1:63] 8 13 5 17 19 17 6 9 17 7 ...
## ..$ absent_red_fox : num [1:63] 15 14 10 18 19 18 16 14 18 15 ...
## ..$ absent_white-tailed_deer: num [1:63] 3 3 2 3 8 11 3 8 13 4 ...
## ..$ absent_fisher : num [1:63] 16 14 11 19 19 18 17 15 18 15 ...
## ..$ absent_grey_wolf : num [1:63] 16 14 11 18 19 18 15 13 18 12 ...
## ..$ absent_grizzly_bear : num [1:63] 13 12 9 16 16 14 14 13 14 12 ...
## ..$ absent_marten : num [1:63] 16 14 9 19 19 18 17 16 18 15 ...
## ..$ absent_snowshoe_hare : num [1:63] 16 10 10 19 18 13 16 13 18 15 ...
## ..$ absent_elk : num [1:63] 16 15 4 17 19 18 17 16 18 7 ...
## ..$ absent_moose : num [1:63] 16 15 7 11 8 17 10 9 5 12 ...
## ..$ absent_mule_deer : num [1:63] 16 15 10 16 19 17 17 16 18 15 ...
## ..$ absent_lynx : num [1:63] 16 15 11 19 19 17 16 14 18 15 ...
## ..$ absent_cougar : num [1:63] 16 15 11 19 19 18 16 16 18 15 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. black_bear = col_double(),
## .. .. coyote = col_double(),
## .. .. red_fox = col_double(),
## .. .. `white-tailed_deer` = col_double(),
## .. .. fisher = col_double(),
## .. .. grey_wolf = col_double(),
## .. .. grizzly_bear = col_double(),
## .. .. marten = col_double(),
## .. .. snowshoe_hare = col_double(),
## .. .. elk = col_double(),
## .. .. moose = col_double(),
## .. .. mule_deer = col_double(),
## .. .. lynx = col_double(),
## .. .. cougar = col_double(),
## .. .. absent_black_bear = col_double(),
## .. .. absent_coyote = col_double(),
## .. .. absent_red_fox = col_double(),
## .. .. `absent_white-tailed_deer` = col_double(),
## .. .. absent_fisher = col_double(),
## .. .. absent_grey_wolf = col_double(),
## .. .. absent_grizzly_bear = col_double(),
## .. .. absent_marten = col_double(),
## .. .. absent_snowshoe_hare = col_double(),
## .. .. absent_elk = col_double(),
## .. .. absent_moose = col_double(),
## .. .. absent_mule_deer = col_double(),
## .. .. absent_lynx = col_double(),
## .. .. absent_cougar = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ total_detections: spc_tbl_ [63 × 33] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ array : chr [1:63] "LUAG" "LUAG" "LUAG" "LUBF" ...
## ..$ site_number : num [1:63] 119 124 137 104 105 12 121 126 130 132 ...
## ..$ site : Factor w/ 63 levels "LUAG_119","LUAG_124",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ black_bear : num [1:63] 4 2 17 1 0 0 18 5 0 12 ...
## ..$ coyote : num [1:63] 21 3 23 2 0 1 28 21 2 27 ...
## ..$ red_fox : num [1:63] 4 1 2 1 0 0 1 2 0 0 ...
## ..$ white-tailed_deer: num [1:63] 62 141 153 48 22 32 187 187 10 666 ...
## ..$ domestic_dog : num [1:63] 0 1 1 0 0 0 0 0 0 0 ...
## ..$ fisher : num [1:63] 0 1 0 0 0 0 0 1 0 0 ...
## ..$ grey_wolf : num [1:63] 0 2 0 1 0 0 4 6 0 26 ...
## ..$ grizzly_bear : num [1:63] 0 1 0 0 0 0 0 0 0 0 ...
## ..$ marten : num [1:63] 0 1 3 0 0 0 0 0 0 0 ...
## ..$ snowshoe_hare : num [1:63] 0 20 1 0 1 21 2 19 0 0 ...
## ..$ elk : num [1:63] 0 0 43 2 0 0 0 0 0 66 ...
## ..$ moose : num [1:63] 0 0 4 9 20 1 16 22 148 3 ...
## ..$ mule_deer : num [1:63] 0 0 4 8 0 1 0 0 0 0 ...
## ..$ unknown : num [1:63] 0 0 2 3 0 9 1 0 12 1 ...
## ..$ unknown_ungulate : num [1:63] 0 0 41 4 0 0 0 0 7 0 ...
## ..$ lynx : num [1:63] 0 0 0 0 0 1 1 3 0 0 ...
## ..$ cougar : num [1:63] 0 0 0 0 0 0 1 0 0 0 ...
## ..$ red_squirrel : num [1:63] 0 0 0 0 0 0 0 1 0 0 ...
## ..$ other : num [1:63] 0 0 0 0 0 0 0 0 14 0 ...
## ..$ atver : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ unknown_deer : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ unknown_bear : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ other_birds : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ human : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ staff : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ spruce_grouse : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ hunter : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ruffed_grouse : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ raven : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ owl : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. array = col_character(),
## .. .. site_number = col_double(),
## .. .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. black_bear = col_double(),
## .. .. coyote = col_double(),
## .. .. red_fox = col_double(),
## .. .. `white-tailed_deer` = col_double(),
## .. .. domestic_dog = col_double(),
## .. .. fisher = col_double(),
## .. .. grey_wolf = col_double(),
## .. .. grizzly_bear = col_double(),
## .. .. marten = col_double(),
## .. .. snowshoe_hare = col_double(),
## .. .. elk = col_double(),
## .. .. moose = col_double(),
## .. .. mule_deer = col_double(),
## .. .. unknown = col_double(),
## .. .. unknown_ungulate = col_double(),
## .. .. lynx = col_double(),
## .. .. cougar = col_double(),
## .. .. red_squirrel = col_double(),
## .. .. other = col_double(),
## .. .. atver = col_double(),
## .. .. unknown_deer = col_double(),
## .. .. unknown_bear = col_double(),
## .. .. other_birds = col_double(),
## .. .. human = col_double(),
## .. .. staff = col_double(),
## .. .. spruce_grouse = col_double(),
## .. .. hunter = col_double(),
## .. .. ruffed_grouse = col_double(),
## .. .. raven = col_double(),
## .. .. owl = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ presence_absence: spc_tbl_ [63 × 33] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ array : chr [1:63] "LUAG" "LUAG" "LUAG" "LUBF" ...
## ..$ site_number : num [1:63] 119 124 137 104 105 12 121 126 130 132 ...
## ..$ site : Factor w/ 63 levels "LUAG_119","LUAG_124",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ black_bear : num [1:63] 1 1 1 1 0 0 1 1 0 1 ...
## ..$ coyote : num [1:63] 1 1 1 1 0 1 1 1 1 1 ...
## ..$ red_fox : num [1:63] 1 1 1 1 0 0 1 1 0 0 ...
## ..$ white-tailed_deer: num [1:63] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ domestic_dog : num [1:63] 0 1 1 0 0 0 0 0 0 0 ...
## ..$ fisher : num [1:63] 0 1 0 0 0 0 0 1 0 0 ...
## ..$ grey_wolf : num [1:63] 0 1 0 1 0 0 1 1 0 1 ...
## ..$ grizzly_bear : num [1:63] 0 1 0 0 0 0 0 0 0 0 ...
## ..$ marten : num [1:63] 0 1 1 0 0 0 0 0 0 0 ...
## ..$ snowshoe_hare : num [1:63] 0 1 1 0 1 1 1 1 0 0 ...
## ..$ elk : num [1:63] 0 0 1 1 0 0 0 0 0 1 ...
## ..$ moose : num [1:63] 0 0 1 1 1 1 1 1 1 1 ...
## ..$ mule_deer : num [1:63] 0 0 1 1 0 1 0 0 0 0 ...
## ..$ unknown : num [1:63] 0 0 1 1 0 1 1 0 1 1 ...
## ..$ unknown_ungulate : num [1:63] 0 0 1 1 0 0 0 0 1 0 ...
## ..$ lynx : num [1:63] 0 0 0 0 0 1 1 1 0 0 ...
## ..$ cougar : num [1:63] 0 0 0 0 0 0 1 0 0 0 ...
## ..$ red_squirrel : num [1:63] 0 0 0 0 0 0 0 1 0 0 ...
## ..$ other : num [1:63] 0 0 0 0 0 0 0 0 1 0 ...
## ..$ atver : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ unknown_deer : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ unknown_bear : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ other_birds : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ human : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ staff : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ spruce_grouse : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ hunter : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ ruffed_grouse : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ raven : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ owl : num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. array = col_character(),
## .. .. site_number = col_double(),
## .. .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. .. black_bear = col_double(),
## .. .. coyote = col_double(),
## .. .. red_fox = col_double(),
## .. .. `white-tailed_deer` = col_double(),
## .. .. domestic_dog = col_double(),
## .. .. fisher = col_double(),
## .. .. grey_wolf = col_double(),
## .. .. grizzly_bear = col_double(),
## .. .. marten = col_double(),
## .. .. snowshoe_hare = col_double(),
## .. .. elk = col_double(),
## .. .. moose = col_double(),
## .. .. mule_deer = col_double(),
## .. .. unknown = col_double(),
## .. .. unknown_ungulate = col_double(),
## .. .. lynx = col_double(),
## .. .. cougar = col_double(),
## .. .. red_squirrel = col_double(),
## .. .. other = col_double(),
## .. .. atver = col_double(),
## .. .. unknown_deer = col_double(),
## .. .. unknown_bear = col_double(),
## .. .. other_birds = col_double(),
## .. .. human = col_double(),
## .. .. staff = col_double(),
## .. .. spruce_grouse = col_double(),
## .. .. hunter = col_double(),
## .. .. ruffed_grouse = col_double(),
## .. .. raven = col_double(),
## .. .. owl = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
First checks look good, but let’s remove a few species we aren’t interested in modeling for this analysis from all three data sets
# first create focal species object with only species we have enough data to model in one response metric or another
glm_focal_vars <- c('site',
'white-tailed_deer',
'black_bear',
'coyote',
'elk',
'grey_wolf',
'grizzly_bear',
'lynx',
'moose',
'mule_deer',
'red_fox',
'snowshoe_hare')
response_metrics_subset <- response_metrics %>%
map(~.x %>%
# use purrr map to select only columns that match the focal species in all data sets
select(matches(paste(glm_focal_vars,
collapse = '|')))
)
Now we should subset each data frame individually even further
We prioritize the proportional monthly detection data as it gives us the most information, so for any species we have enough (we think) data for we will keep them in this data frame, otherwise we will remove
response_metrics_subset$prop_detections <- response_metrics_subset$prop_detections %>%
# remove mule deer, red fox, and grizzly bear
select(-contains(c('red_fox',
'mule_deer',
'grizzly_bear')))
head(response_metrics_subset$prop_detections)
## # A tibble: 6 × 17
## site black_bear coyote `white-tailed_deer` grey_wolf snowshoe_hare elk
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 LUAG_119 4 8 13 0 0 0
## 2 LUAG_124 1 2 12 1 5 0
## 3 LUAG_137 3 6 9 0 1 7
## 4 LUBF_104 1 2 16 1 0 2
## 5 LUBF_105 0 0 11 0 1 0
## 6 LUBF_12 0 1 7 0 5 0
## # ℹ 10 more variables: moose <dbl>, lynx <dbl>, absent_black_bear <dbl>,
## # absent_coyote <dbl>, `absent_white-tailed_deer` <dbl>,
## # absent_grey_wolf <dbl>, absent_snowshoe_hare <dbl>, absent_elk <dbl>,
## # absent_moose <dbl>, absent_lynx <dbl>
Now moving onto our alternative response metrics, I’m not sure which we will use yet for the remaining three species so we will keep all three in both
Basically this code will be the inverse of the code above to simply keep only those species instead of remove them
# presence absence
response_metrics_subset$presence_absence <- response_metrics_subset$presence_absence %>%
# remove mule deer, red fox, and grizzly bear
select(contains(c('site',
'elk',
'red_fox',
'mule_deer',
'grizzly_bear')))
head(response_metrics_subset$presence_absence)
## # A tibble: 6 × 6
## site_number site elk red_fox mule_deer grizzly_bear
## <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 119 LUAG_119 0 1 0 0
## 2 124 LUAG_124 0 1 0 1
## 3 137 LUAG_137 1 1 1 0
## 4 104 LUBF_104 1 1 1 0
## 5 105 LUBF_105 0 0 0 0
## 6 12 LUBF_12 0 0 1 0
# total detections
response_metrics_subset$total_detections <- response_metrics_subset$total_detections %>%
# remove mule deer, red fox, and grizzly bear
select(contains(c('site',
'red_fox',
'mule_deer',
'grizzly_bear')))
head(response_metrics_subset$total_detections)
## # A tibble: 6 × 5
## site_number site red_fox mule_deer grizzly_bear
## <dbl> <fct> <dbl> <dbl> <dbl>
## 1 119 LUAG_119 4 0 0
## 2 124 LUAG_124 1 0 1
## 3 137 LUAG_137 2 4 0
## 4 104 LUBF_104 1 8 0
## 5 105 LUBF_105 0 0 0
## 6 12 LUBF_12 0 1 0
Okay now that these are cleaned up we can remove the full response metrics list from our environment so we don’t use it
rm(response_metrics)
We also need our potential explanatory variables
In the previous script, 2_ACME_SRFN_landscape_covariate_exploration_script.Rmd we cleaned up the covariates data let’s also read that in as we will need to join it with the response metric data to run the models
covariates <- read_csv('data/processed/srfn_covariates_grouped.csv',
# also specify how site is read in
col_types = cols(site = col_factor())
)
str(covariates)
## spc_tbl_ [1,200 × 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ site_number : num [1:1200] 1 2 4 6 10 12 13 17 18 21 ...
## $ site : Factor w/ 60 levels "LUD_1","LUC_2",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ buff_dist : num [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
## $ harvest : num [1:1200] 0.432 0.342 0 0.388 0.424 ...
## $ harvest_2000 : num [1:1200] 0.355 0 0 0.179 0.424 ...
## $ harvest_pre2000: num [1:1200] 0.0763 0.3418 0 0.209 0 ...
## $ pipeline : num [1:1200] 0 0.148 0.0148 0 0 ...
## $ roads : num [1:1200] 0.00 5.99e-02 7.05e-03 7.11e-06 6.75e-03 ...
## $ seismic_lines : num [1:1200] 0.00 5.41e-05 0.00 0.00 0.00 ...
## $ veg_edges : num [1:1200] 0 0.09955 0.0129 0.00112 0.01425 ...
## $ wells : num [1:1200] 0 0 0.0183 0.0318 0.0332 ...
## $ lc_agriculture : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_broadleaf : num [1:1200] 0 0.18 0 0 0 ...
## $ lc_coniferous : num [1:1200] 0.847 0 0.743 0.442 0.284 ...
## $ lc_developed : num [1:1200] 0 0.4514 0.0716 0.00837 0.04522 ...
## $ lc_grassland : num [1:1200] 0 0.3608 0.0618 0 0 ...
## $ lc_mixed : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_shrub : num [1:1200] 0.15301 0.00776 0.12401 0.54941 0.6703 ...
## - attr(*, "spec")=
## .. cols(
## .. site_number = col_double(),
## .. site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. buff_dist = col_double(),
## .. harvest = col_double(),
## .. harvest_2000 = col_double(),
## .. harvest_pre2000 = col_double(),
## .. pipeline = col_double(),
## .. roads = col_double(),
## .. seismic_lines = col_double(),
## .. veg_edges = col_double(),
## .. wells = col_double(),
## .. lc_agriculture = col_double(),
## .. lc_broadleaf = col_double(),
## .. lc_coniferous = col_double(),
## .. lc_developed = col_double(),
## .. lc_grassland = col_double(),
## .. lc_mixed = col_double(),
## .. lc_shrub = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(covariates)
## site_number site buff_dist harvest
## Min. : 1.00 LUD_1 : 20 Min. : 250 Min. :0.0000
## 1st Qu.: 34.00 LUC_2 : 20 1st Qu.:1438 1st Qu.:0.0879
## Median : 65.00 LUC_4 : 20 Median :2625 Median :0.2466
## Mean : 69.82 LUS_6 : 20 Mean :2625 Mean :0.2517
## 3rd Qu.:107.75 LUS_10 : 20 3rd Qu.:3812 3rd Qu.:0.3814
## Max. :141.00 LUBF_12: 20 Max. :5000 Max. :0.9863
## (Other):1080
## harvest_2000 harvest_pre2000 pipeline roads
## Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.008954 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.002420
## Median :0.133834 Median :0.05305 Median :0.00450 Median :0.007065
## Mean :0.142348 Mean :0.09638 Mean :0.01031 Mean :0.007511
## 3rd Qu.:0.221274 3rd Qu.:0.14270 3rd Qu.:0.01523 3rd Qu.:0.011097
## Max. :0.856826 Max. :0.98631 Max. :0.14867 Max. :0.059875
##
## seismic_lines veg_edges wells lc_agriculture
## Min. :0.000000 Min. :0.000000 Min. :0.0000000 Min. :0.00000
## 1st Qu.:0.001827 1st Qu.:0.003484 1st Qu.:0.0008336 1st Qu.:0.00000
## Median :0.003612 Median :0.012634 Median :0.0089454 Median :0.00000
## Mean :0.004028 Mean :0.013908 Mean :0.0103535 Mean :0.03587
## 3rd Qu.:0.005451 3rd Qu.:0.021155 3rd Qu.:0.0175866 3rd Qu.:0.00000
## Max. :0.030028 Max. :0.099551 Max. :0.0957837 Max. :0.49000
##
## lc_broadleaf lc_coniferous lc_developed lc_grassland
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.1463 1st Qu.:0.03179 1st Qu.:0.01782 1st Qu.:0.006635
## Median :0.3010 Median :0.23137 Median :0.05463 Median :0.034291
## Mean :0.3502 Mean :0.23902 Mean :0.05948 Mean :0.055123
## 3rd Qu.:0.5250 3rd Qu.:0.38303 3rd Qu.:0.08856 3rd Qu.:0.068804
## Max. :1.0000 Max. :0.84699 Max. :0.45140 Max. :0.883334
##
## lc_mixed lc_shrub
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.04624
## Median :0.01965 Median :0.10172
## Mean :0.04277 Mean :0.15602
## 3rd Qu.:0.06350 3rd Qu.:0.20080
## Max. :0.93137 Max. :0.93212
##
We do need to subset the data so we have separate data frames for each buffer width to work with in the analysis AND to explore correlations between variables at each buffer width, as these may very with spatial scales
Let’s use a for loop to subset the data, thanks Andrew!
buffer_frames <- list()
for (i in unique(covariates$buff_dist)){
print(i)
# Subset data based on radius
df <- covariates %>%
filter(buff_dist == i)
# list of dataframes
buffer_frames <-c (buffer_frames, list(df))
}
## [1] 250
## [1] 500
## [1] 750
## [1] 1000
## [1] 1250
## [1] 1500
## [1] 1750
## [1] 2000
## [1] 2250
## [1] 2500
## [1] 2750
## [1] 3000
## [1] 3250
## [1] 3500
## [1] 3750
## [1] 4000
## [1] 4250
## [1] 4500
## [1] 4750
## [1] 5000
# name list objects so we can extract names for plotting
buffer_frames <- buffer_frames %>%
# absurdly long way to do this but for sake of time fuck it
purrr::set_names('250 meter buffer',
'500 meter buffer',
'750 meter buffer',
'1000 meter buffer',
'1250 meter buffer',
'1500 meter buffer',
'1750 meter buffer',
'2000 meter buffer',
'2250 meter buffer',
'2500 meter buffer',
'2750 meter buffer',
'3000 meter buffer',
'3250 meter buffer',
'3500 meter buffer',
'3750 meter buffer',
'4000 meter buffer',
'4250 meter buffer',
'4500 meter buffer',
'4750 meter buffer',
'5000 meter buffer')
Now we have a list with data frames for each buffer width which we can work with later.
Now that we have the covariate data formatted we need to the response metric data frames that we will use for each species and tidy up the resulting data
We will want separate data frames for each type of response metric - we will use the proportional detection data for all the species we have enough detections for, otherwise we will use either the total detections or presence absence data depending how the models fit.
# proportional detections
prop_det_data <- buffer_frames %>%
# use purrr to join data to all individual buffer frames data sets
purrr::map(
~.x %>%
# use left join so only sites with covariate data are kept
left_join(response_metrics_subset$prop_detections,
by = 'site'))
# total detections
total_det_data <- buffer_frames %>%
# use purrr to join data to all individual buffer frames data sets
purrr::map(
~.x %>%
# use left join so only sites with covariate data are kept
left_join(response_metrics_subset$total_detections,
by = 'site'))
# presence absence
pres_absen_dat <- buffer_frames %>%
# use purrr to join data to all individual buffer frames data sets
purrr::map(
~.x %>%
# use inner join so only sites with both data
inner_join(response_metrics_subset$presence_absence,
by = 'site') %>%
mutate(across(c(elk:grizzly_bear), as.factor)))
I’m going to view each of these through the RStudio environemnt to look at them in depth and will print a subset of each ehre
head(prop_det_data$`250 meter buffer`)
## # A tibble: 6 × 34
## site_number site buff_dist harvest harvest_2000 harvest_pre2000 pipeline
## <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 LUD_1 250 0.432 0.355 0.0763 0
## 2 2 LUC_2 250 0.342 0 0.342 0.148
## 3 4 LUC_4 250 0 0 0 0.0148
## 4 6 LUS_6 250 0.388 0.179 0.209 0
## 5 10 LUS_10 250 0.424 0.424 0 0
## 6 12 LUBF_12 250 0 0 0 0.0752
## # ℹ 27 more variables: roads <dbl>, seismic_lines <dbl>, veg_edges <dbl>,
## # wells <dbl>, lc_agriculture <dbl>, lc_broadleaf <dbl>, lc_coniferous <dbl>,
## # lc_developed <dbl>, lc_grassland <dbl>, lc_mixed <dbl>, lc_shrub <dbl>,
## # black_bear <dbl>, coyote <dbl>, `white-tailed_deer` <dbl>, grey_wolf <dbl>,
## # snowshoe_hare <dbl>, elk <dbl>, moose <dbl>, lynx <dbl>,
## # absent_black_bear <dbl>, absent_coyote <dbl>,
## # `absent_white-tailed_deer` <dbl>, absent_grey_wolf <dbl>, …
str(pres_absen_dat$`2000 meter buffer`)
## tibble [59 × 23] (S3: tbl_df/tbl/data.frame)
## $ site_number.x : num [1:59] 1 2 4 6 10 12 13 17 18 21 ...
## $ site : Factor w/ 64 levels "LUD_1","LUC_2",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ buff_dist : num [1:59] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
## $ harvest : num [1:59] 0.381 0.476 0.252 0.421 0.576 ...
## $ harvest_2000 : num [1:59] 0.195 0.284 0.232 0.243 0.224 ...
## $ harvest_pre2000: num [1:59] 0.186 0.1917 0.0194 0.1777 0.3515 ...
## $ pipeline : num [1:59] 0.00509 0.042 0.03546 0.02548 0.00134 ...
## $ roads : num [1:59] 0.00872 0.01885 0.01674 0.01398 0.00797 ...
## $ seismic_lines : num [1:59] 0.006809 0.003832 0.007541 0.004027 0.000759 ...
## $ veg_edges : num [1:59] 0.0157 0.0332 0.0289 0.0229 0.0115 ...
## $ wells : num [1:59] 0.0164 0.0359 0.0337 0.0226 0.011 ...
## $ lc_agriculture : num [1:59] 0 0 0 0 0 0 0 0 0 0 ...
## $ lc_broadleaf : num [1:59] 0.035 0.0264 0.1201 0.1349 0.1353 ...
## $ lc_coniferous : num [1:59] 0.398 0.55 0.576 0.386 0.375 ...
## $ lc_developed : num [1:59] 0.0772 0.1714 0.1277 0.133 0.0555 ...
## $ lc_grassland : num [1:59] 0 0.08906 0.0449 0.04254 0.00478 ...
## $ lc_mixed : num [1:59] 0 0.00275 0 0.00869 0 ...
## $ lc_shrub : num [1:59] 0.49 0.16 0.132 0.294 0.429 ...
## $ site_number.y : num [1:59] 1 2 4 6 10 12 13 17 18 21 ...
## $ elk : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 1 1 ...
## $ red_fox : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 2 1 1 1 ...
## $ mule_deer : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
## $ grizzly_bear : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 1 ...
Let’s remove the objects we no longer need from the environment to keep our work space clean
rm(buffer_frames,
covariates,
df,
response_metrics_subset)
Let’s also save this so we can use it in a later script to make figures
saveRDS(prop_det_data, 'data/processed/prop_det_data.rds')
Before we can develop any models we need to look at your covariates and see if there are any major violations of the model assumptions of independence, e.g. variables that are highly correlated - in which case we need to choose which variable in each pair to include in a model
prop_det_data %>%
purrr::map(
~.x %>%
# select only columns with covariates not other info to simplify the plot a bit
select(harvest:lc_shrub) %>%
# use chart.correlation to produce plots for each buffer size
chart.Correlation(.,
histogram = TRUE,
method = "pearson")
)